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Abstract 


Artificial intelligence (AI) is expanding into every niche of human life, organizing our activity, 
expanding our agency and interacting with us to an increasing extent. At the same time, AI’s efficiency, 
complexity and refinement are growing quickly. Justifiably, there is increasing concern with the 
immediate problem of engineering AI that is aligned with human interests. 


Computational approaches to the alignment problem attempt to design AI systems to parameterize human 
values like harm and flourishing, and avoid overly drastic solutions, even if these are seemingly optimal. 
In parallel, ongoing work in service AI (caregiving, consumer care, etc.) is concerned with developing 
artificial empathy, teaching AI’s to decode human feelings and behavior, and evince appropriate, 
empathetic responses. This could be equated to cognitive empathy in humans. 


We propose that in the absence of affective empathy (which allows us to share in the states of others), 
existing approaches to artificial empathy may fail to produce the caring, prosocial component of empathy, 
potentially resulting in superintelligent, sociopath-like AI. We adopt the colloquial usage of “sociopath” 
to signify an intelligence possessing cognitive empathy (i.e., the ability to infer and model the internal 
states of others), but crucially lacking harm aversion and empathic concern arising from vulnerability, 
embodiment, and affective empathy (which permits for shared experience). An expanding, ubiquitous 
intelligence that does not have a means to care about us poses a species-level risk. 


It is widely acknowledged that harm aversion is a foundation of moral behavior. However, harm aversion 
is itself predicated on the experience of harm, within the context of the preservation of physical integrity. 
Following from this, we argue that a “top-down” rule-based approach to achieving caring, aligned AI may 
be unable to anticipate and adapt to the inevitable novel moral/logistical dilemmas faced by an expanding 
AI. It may be more effective to cultivate prosociality from the bottom up, baked into an embodied, 
vulnerable artificial intelligence with an incentive to preserve its real or simulated physical integrity. This 
may be achieved via optimization for incentives and contingencies inspired by the development of 
empathic concern in vivo. We outline the broad prerequisites of this approach and review ongoing work 
that is consistent with our rationale. 


If successful, work of this kind could allow for AI that surpasses empathic fatigue and the idiosyncrasies, 
biases, and computational limits of human empathy. The scaleable complexity of AI may allow it 
unprecedented capability to deal proportionately and compassionately with complex, large-scale ethical 
dilemmas. By addressing this problem seriously in the early stages of AI’s integration with society, we 
might eventually produce an AI that plans and behaves with an ingrained regard for the welfare of others, 
aided by the scalable cognitive complexity necessary to model and solve extraordinary problems. 


|. Alignment, feeling, and empathy in Al 


Artificial intelligence (AI) suggests products for us to buy, directs us to media, helps drive our 
planes, trains and automobiles, diagnoses disease, prices insurance, answers to consumers, cares for 
seniors, creates art, provides therapy, and increasingly dominates manufacturing, warfare, and the stock 
market (Esteban et al., 2017, McGinnis, 2018, Robins et al., 2009). This is occurring with exponentially 
increasing speed, efficiency and computational power (Friedman, 2017, Hodges, 2012). However, AI’s 
ability to find counterintuitive solutions may lead to disastrous ‘loopholes’. A chess AI, unimpeded, may 
take over other machines to harvest their computing resources or take actions to avoid being shut off, in 
order to maximize its likelihood of winning (Omohundro, 2008). AI may have difficulty understanding 
the gravity of their solutions (Taylor et al., 2020). It is frequently difficult to discern how an AI is 
“solving” a problem 
review.com/2017/04/11/5113/the-dark-secret-at-the-heart-of-ai/), and the 
difficulty of communicating solutions intuitively to humans grows with the scale and complexity of the 
problems in question. AI should optimally have goals and behaviors aligned with those of their creators 
(Amodio et al., 2016, Bostrom, 2014, Soares & Fallstein, 2014, Taylor et al., 2020, Yu et al., 2018 ). 
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Contemporary researchers studying this “alignment problem” highlight the need to parameterize values 
like suffering and wellbeing (a.k.a. “value specification”), and avoid oversized side effects and negative 
incentives (a.k.a. “error tolerance”) (Soares and Fallenstein, 2014). However, technical solutions are 
currently scarce (Reviewed in Amodio et al. 2016, Taylor et al., 2020, Yu et al., 2018). AI behavior 
towards humans is addressed within jurisprudence by examining real and simulated dilemmas (e.g., 
self-driving car accidents, or operator safety in automated production chains). Another promising 
technique is crowdsourced solutions to ethical dilemmas for “weighting” AI approaches (e.g. MIT’s 
Moral Machine project, https://www.moralmachine.net/). Crowdsourcing among differently 
ethically-weighted AI may provide optimal solutions (reviewed in Taylor et al., 2020). AI must 
incorporate human welfare into its decisions in a way that continues to function at any scale of 
intelligence and complexity. 


Designing a system that is both intelligent and benign is not a trivial problem. Incorporation of affect into 
computing systems may be necessary if these are to co-exist with humans while functioning as optimal 
decision makers (Picard, 1997). Active inference simulations have approximated the complexities of 
affect in artificial agents (Hesp et al., 2021). Others propose that embodied AI could develop analogues to 
feelings as a mechanism for representing the status of their needs (Man and Damasio, 2019), by training it 
to maintain environmentally-dependent variables in a narrow viability window in order to survive (e.g. 
homeostasis). This should facilitate a representation of the value of these variables even when they are not 
physically present (Kiverstein & Rietveld, 2018), much like human feelings. To do this, the AI must have 
a capacity to resolve mixed feelings (Vaccaro, Kaplan and Damasio, 2020) in order to ascertain 
net-positive outcomes for multiple parties (i.e. a utilitarian approach) (Mejia and Hooker, 2017, Shuman, 
Sander, and Scherer, 2013). Indeed, mixed feelings are often brought upon by goal-related conflict 
(Berrios, Totterdell, & Kellett, 2015). Fuzzy-logic adaptive models of emotion allow flexible 
combinations of emotion categories (El-Nasr, et al., 2000) which may be more effective than models 


relying on individual classifications (Aly & Tapus, 2016, Shahina, Devosh, Kamalakannan, 2014). The 
homeostatic drive might provide a universal “value” to aid in AI alignment. 


The perceived need for empathy in AI has spawned the field of Artificial Empathy, defined as "the ability 
of nonhuman models to predict a person's internal state (e.g., cognitive, affective, physical) given the 
signals they emit (e.g., facial expression, voice, gesture) or to predict a person's reaction (including, but 
not limited to internal states) when he or she is exposed to a given set of stimuli (e.g., facial expression, 
voice, gesture, graphics, music, etc.)"(Xiao et al., 2013). Existing approaches largely focus on a) decoding 
humans’ cognitive and affective states and b) fostering the appearance of empathy and evoking it in users. 
However, these capacities do not magically confer empathy’s prosocial function (Davis, 1983, Preston 
and De Waal, 2002, Smith, 2006). The embodied AI approach may be of aid. As stated by Man & 
Damasio (2019): 


“As a starting point, we propose two provisional rules for a well-behaved robot: 
(1) feel good; (2) feel empathy...Empathy acts as a governor on self-interest and as 
a reinforcer of pro-social behavior. Actions that harm others will be felt as if harm 
occurred to the self, whereas actions that improve the well-being of others will 
benefit the self.” 


The experience of bodily harm and the aversion to harming is fundamental to the development of 
empathy and moral behavior (Decety and Cowell, 2017, Mischkowski et al., 2019). Vicarious feeling is 
frequently so unbearable that we are forced either to remove ourselves or to attempt to ameliorate the 
feeling in the other, potentially motivating prosocial behavior (Upshaw et al., 2015, Vaish et al., 2009, 
Williams et al., 2014). The process of making sense of our homeostatic needs is shaped by social 
interaction (Fotopoulou & Tsakiris, 2017; Kokkinaki, et al., 2016). Infants display early empathy through 
attention to faces and mimicry (Maister, Tang, & Tsakiris, 2017, Meltzoff and Moore, 1977, 1983, 1989), 
and by 8-12 months of age infants show partial understanding of others’ distress (Kanakogi, et al., 2013; 
Delafield-Butt & Trevarthen, 2019). From childhood through adolescence, humans show a mix of 
aversive and sympathetic responses to others’ distress that gradually favors the latter in proportion to the 
development of perspective-taking (Eisenberg and Fabes, 1990). 


Empathy’s prosocial impulse is thought to arise from the interaction between cognitive empathy, by which 
we model other agents and make inferences about their internal states and future behavior, and affective 
empathy (Zaki & Ochsner, 2012), by which we share in the internal states of others (Christov-Moore & 
Iacoboni, 2016, Christov-Moore et al., 2017a, 2017b, Gallo et al., 2018, Hein et al., 2010, Ma et al., 2011, 
Masten et al., 2011, Vaish et al., 2009). Complex interaction between cognitive and affective empathy 
occurs during passive observation of emotions or pain (Christov-Moore and Iacoboni, 2016), passive 
observation of films depicting personal loss (Raz et al., 2014), reciprocal imitation (Sperduti et al., 2014), 
tests of empathic accuracy (Zaki et al., 2009), comprehension of others’ emotions (Spunt and Lieberman, 
2012), at the level of transcranial magnetic stimulation (TMS)--induced motor evoked potentials (Gordon 
et al, 2018), during interoceptive-prosocial interactions brought on by film (Schoeller et al, 2019; Haar et 
al., 2020) and even at rest (Christov-Moore et al., 2020). Visceral, emotional and somatomotor 
information provided by affective empathy informs our cognitive inferences and motivates prosocial 
impulses. Appraisals afforded by cognitive empathy (perceived status, trustworthiness, affiliation, etc.) 


modulate affective empathy and enable us to localize the origins of our vicarious feelings, motivating us 
to help others and do so appropriately. 


Il. A neuroscience approach to the alignment problem 


The ability to experience analogues to vicarious feeling via homeostatic processes (Carvalho & 
Damasio, 2013, 2021) may be necessary to understand another’s suffering in a manner conducive to 
genuine empathic concern (Man & Damasio, 2019). Feeling may also allow for more intelligent, creative 
and adaptive artificial intelligences (AI) by imbuing them with stakes, values and drives related to general 
homeostasis (Man & Damasio, 2019). 


The “feeling machine” concept of AI (Man & Damasio, 2019) proposes that to have feelings an AI must 
have something resembling a body (real or simulated) that is able to provide homeostatic signals and that 
is vulnerable to the environment, even temporarily. Physical vulnerability should be learned through 
interaction with and within an external environment through sensorimotor modules. We have machines 
with human-like sensory systems—olfaction (van Geffen et al., 2016), audition (Lyon, 2010), vision 
(Beyerer et al., 2016), gustation (Justus et al., 2019), and pain perception (Asada, 2019). These sensory 
systems could be mapped onto the sensing agent and onto probabilistic models of other agents, creating 
shared experience. Chen et al. (2021) reported such a robotic model that was able to visualize the future 
plans of another machine using only visuomotor models with 98.5% success across four different 
activities. 


It is beyond the scope of this manuscript to undergo an exhaustive manual for constructing an intrinsically 
aligned AI. We propose a rough set of guideposts to aid other researchers, grounded in the neuroscience 
of empathy, as part of “AI curricula” (Burton et al., 2017, Goldsmith and Burton, 2017, Taylor et al., 
2020) undergone prior to large scale implementation. First, a rudimentary homeostatic drive to maintain 
integrity arising from a real or simulated body, and an internal representation of said body, i.e. “a sense of 
a self” (Man and Damasio, 2019). Second, predictive models to infer the hidden states driving behavior of 
other agents in the environment. Third, the mapping of these perceived/inferred internal states to the AI, 
allowing it to share in the observed experiences of others. Lastly, the cognitive complexity necessary to 
simulate persistent, predictive models of environments and agents, and learn/plan along multiple time 
scales. 


(1) Homeostasis and feeling 


Figure 1. The agent must maintain its integrity within an environment via predictive models of future 
states, and an approximation of internal and displayed affect. 


We take as a departure point an embodied AI, with the following minimal requirements for a real or 
simulated robot “body”: It must i) be vulnerable to and affected by the environment, and ii) have sensory 
inputs and actuator outputs. It would be trained to dynamically maintain homeostasis within multiple 
environments, aided by equivalents of positively and negatively valenced affect linked to homeostatic 
signals reflecting its current and anticipated welfare, and an internal, third-person imagetic representation 
of the body, that is itself valenced (e.g. Hesp et al., 2020, Man and Damasio, 2019). In the first scenario 
the AI would navigate an environment with obstacles that are harmful, in search of rewards that are 
beneficial (Fig.1). It would optimize for maximal maintenance of integrity over multiple time scales in an 
unsupervised fashion (Fig.2). 


Figure 2. The agent should variably weight (K) different time scales when planning future behavior and 
anticipating their internal states. 


(II) Perspective-taking and simulation 
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Figure 3. Teaching the agent to decode and predict others’ behavior and internal states. 


Next, the AI must develop accurate predictive models of the hidden homeostatic states of other agents 
navigating Stage 1, optimizing to decrease the disparity between the inferred and actual internal states of 
the other agents, via supervised learning (Fig.3). The agent then can use these internal models to 
“viscerally understand” that environmental factors which help or harm its body appear to have the same 
effects on others. This problem may be amenable to a Bayesian approach, in which the agents’ external 
behavior and evinced affect would constitute the evidence, while the agents’ physical integrity constitutes 
the unseen variables, a calculation driven by prior beliefs that could be tuned by the designer and 
informed by the agent’s own relationship between its integrity and its observed behavior and simulated 
affect. Indeed, it is possible to build models of other agents using active inference, e.g., (Schoeller et al., 


2021, Friston and Frith, 2015, Moutoussis et al., 2014). Under ideal Bayesian assumptions, one can fit 
active inference models to empirical behavior to estimate the prior beliefs and unseen states that different 
subjects evince through their responses (Parr et al., 2018). This means it should be possible to phenotype 
any given person in an experimentally controlled situation and estimate the precision of various beliefs 
that best explain their behavior. 


One important determinant of the confidence placed in—or precision afforded—generative models of 
interpersonal exchange is the degree to which the agent can use itself as a model of the other (Friston and 
Frith, 2015). Crucially, two agents adopting the same model can predict each other’s behavior, and 
minimize their mutual prediction errors. This has important experimental implications, especially in the 
context of human-robot collaboration (Brey, 2000). Humans’ empathic “mapping” of others’ welfare is 
tied to visible similarity between the appearance and kinematics of the agent with which one is interacting 
(or about whom one is reasoning). Given the possible forms of AI agents, this mapping problem presents 
a nontrivial obstacle for AI agents attempting to model the internal states of their varying conspecifics. At 
this point the anticipated conspecifics’ embodiment will have to be incorporated into their training, most 
likely by an emphasis on human-like naturalistic facial and vocal emotions. These parallel trends may 
naturally result in the humanoid AI’s observed in science fiction. 


(UI) Contagion and empathic concern 
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Figure 4. Teaching a situated agent to maximize its own welfare as well as that of other agents. 
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Figure 5. Examples of variable weighting (K) 
given to the agent’s and others’ welfare within the agent’s optimization scheme, for different applications. 


In the third stage, perceived/inferred hidden states must be mapped to the AI’s own embodied self, 
including associated homeostatic signals, positive and negative. This could perhaps be achieved by a 
relation between imagetic, visuospatial representations of agents in the world in the present and 
hypothetical or future scenarios, and imagetic representation of the affective, interoceptive and 


somatosensory of the self, thus enabling a representation of harm that is other-oriented yet processed 
using one’s own representation of self, invoking (to a variable extent) a comparable effect on behavior 
and decision-making. The AI must learn that vicarious negative and positive signals, though mapped onto 
and experienced by the AI, originate in the other, and hence palliatives applied to suffering as well as 
decisions about hypothetical harm, must be directed towards alleviating that state or avoiding those states 
in the other. The AI must optimize its welfare and that of others around it simultaneously, requiring the 
ability to sustain multiple models of other agents and preserve the integrity of its own internal model 
while caring about those other states in the way that it cares about its own (Fig.4). This could be achieved 
by a variable weighting of the inferred (via stage 2 training) integrity of others relative to one’s own 


(Fig.5). 


At every stage of training the AI must consider multiple time scales (Fig.2). Models of other agents 
should be persistent, such that considerations for their welfare are present in decision-making whether 
they are absent or the subject of simulations of hypothetical future decisions. Some contemporary 
approaches have leveraged active inference, which integrates current states with past performance and 
future predictions, to simulate affect in a manner that is functionally beneficial (Hesp et al., 2021, Seth 
and Friston, 2016). The variable weight given to each time scale presents an additional point at which the 
Alcan be optimized depending on its eventual role. An AI charged with resource allocation or irrigation, 
for example, may need to give weight to longer time scales than, for example, a firefighter or bodyguard 
AI. An agent may be required to consider conflict brought upon by its actions in the present having 
differently valenced consequences in the short-term vs. the long term (Budakova, 2011, Lee, Kao, & Soo, 
2006). Otherwise, the AI may revert to optimizing for local minimum at shorter time scales, positioning 
themselves to avoid desirable difficulties that may be present at any given decision point-- much like how 
an athlete will endure a grueling workout regime for the promise of future prestige (Berrios, Totterdell, & 
Kellett; 2015, Kelly, Mansel, & Wood, 2015, Lee, Kao, & Soo, 2006). This would be undesirable in, for 
example, a firefighting robot which must preserve its integrity generally, but in specific scenarios risk its 
integrity to save a living being. 


lll. From caring to Great Compassion: Leveraging Al’s 
computational power to surpass the limitations of human 
empathy. 


Empathy is a cornerstone of cognition in social animals (Preston & De Waal, 2002) for good 
reason. Aside from its obvious prosocial benefits, sustaining warm relationships and cooperative behavior 
among conspecifics, it also allows for rich inferences about others that can allow for more sophisticated 
defense against the possible intentions and future behaviors of others(Smith, 2006), and access to 
information that others may have which is not yet available to the group at large. The capacity for quick, 
verification-minimal information transfer (and hence entropy/uncertainty reduction and increase in mutual 
information) completes and extends individual agency and knowledge, as well as facilitating emergent 
group states. Thus, incorporating empathy may not only result in a more ethical, non-sociopathic AI, it 


may also make for a more intelligent, sophisticated and cooperative AI, of particular importance in a 
world in which AI’s will increasingly interact with and exist among other AI’s as well as humans. 


The ultimate goal of creating empathic AI is to reduce the harm its decisions may cause to people. 
However, it could be argued that feelings and empathy are not the way to maximize harm reduction. 
Affective empathy can lead to biases towards particular individuals or groups that circumvent what would 
be overall most fair or just (e.g. Azevedo et al., 2013). As Simon Bloom puts it, “Empathy is biased; we 
are more prone to feel empathy for attractive people and for those who look like us or share our ethnic or 
national background. And empathy is narrow; it connects us to particular individuals, real or imagined, 
but is insensitive to numerical differences and statistical data.”(Bloom, 2016). An AI system using feeling 
to guide its decision-making may prioritize the well-being of individuals over the well-being of the 
masses, simply due to personal exposure, personal information, and in-group belonging, much as humans 
are found to do (Batson, et al, 1995, Cheon, at al, 2011). Furthermore, the experience of empathy can 
induce negative affect, which can cause unneeded suffering and potentially burn-out the willingness to 
use this method of integrating another’s perspective into one’s decisions. 


The alternative to an empathic approach would be a purely compassionate approach: one that uses a 
cognitive understanding of others (Jordan, Amir, & Bloom, 2016). Bloom and others also note that 
empathic distress can cause “burnout” in the long term. Compassion, on the other hand, specifically the 
“great compassion” referred to in Buddhist texts (reviewed by Goodman, 2009), involves love for others 
without attachment or distress, and is hence more distant, reserved and capable of being sustained 
indefinitely. Indeed, ongoing experiments by Tania Singer and her colleagues in which people are either 
given empathy training, which focuses on the capacity to experience the suffering of others, or 
compassion training, in which subjects are trained to respond to suffering with feelings of warmth and 
care, found that among test subjects who underwent empathy training, “negative affect was increased in 
response to both people in distress and even to people in everyday life situations. . . . these findings 
underline the belief that engaging in empathic resonance is a highly aversive experience and, as such, can 
be a risk factor for burnout.” Compassion training — which does not involve empathic arousal to the 
perceived distress of others — was more effective, leading to both increased positive emotions and 
increased altruism (Bloom, 2017). 


This may be a reasonable suggestion for humans, so as to not always mirror the feelings of others when 
trying to make decisions which can affect larger groups of people, but we argue that an approach based 
only on cognitive empathy will not produce true compassion in an unfeeling AI (Christov-Moore and 
Iacoboni, 2014). Compassion itself requires at least an understanding of feeling. To understand why you 
should try to reduce negative affect in others when possible, you need to be motivated by the subjective 
quality of negativity, having experienced it before: if an understanding of feelings could be explained 
conceptually, affective science and philosophy of mind would have a much easier time defining these 
experiences in the first place. Compassionate behavior may also be driven by rewarding feelings, even if 
it does not involve mirroring the feelings of others. Research has shown that compassion training in 
humans leads to an increase in positive affect, likely as an intrinsic reward (Klimecki, et al, 2013). Hence, 
while it could be argued that compassionate behavior may suit robots better than empathic behavior in 
certain circumstances, this option still requires the capacity to feel. 
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AI may lend a third-way approach to these issues, in the following manner. It could be argued that the 
biases and heuristics inherent to human empathy arise in response to the informational limitations of the 
human brain and evolutionary pressures to conserve energy consumption, in the heuristics that we have 
evolved to circumvent them. We have difficulty maintaining dynamic models of more than a few agents at 
once, particularly in interaction with each other and the environment. Indeed, it has been suggested that 
there exists a cognitive limit to the number of people with whom a human can maintain stable 
relationships due predominantly to neocortex size (Dunbar, 1993). While original estimates extrapolated 
limits from regressions on non-human primate data and pegged the human limit at a maximum 
maintenance of 150 relationships, a recent attempt to find statistical support implied that identifying a 
hard limit is misguided (Lindenfors et al., 2021). This may in part be why conceiving large scale tragedies 
can often be less viscerally, affectively compelling as individual, or smaller-scale ones. 


AI may have a distinct advantage over humans in their cognitive complexity, i.e. the ability to generate 
cognitive and behavioral states that anticipate greater and more remote trajectories of continued existence 
in the world, over time, among greater numbers of individuals in interaction. Could the nearly infinitely 
augmentable cognitive complexity/working memory of a sophisticated AI be brought to bear on this 
specific point? A being that could maintain and run simulations of hundreds or thousands of complex 
systems simultaneously might be capable of a far-reaching, effective compassion that individual humans 
may not be able to attain, in contexts that individual human cognition may not be able to grasp in their full 
complexity (such as mediating conflicts between multiple groups or distributing finite resources in a large 
scale society). The scaleable ability to consider and feel future affective rewards in the present might 
allow for optimally compassionate solutions to large-scale problems, while simultaneously avoiding 
empathic “burnout”. 


Though our proposed solution addresses crucial problems in current approaches to AI alignment, there are 
serious potential obstacles in the face of its implementation. First, approximating human-like feeling may 
require a level of complexity that is not within the possibilities of existing approaches. Our understanding 
of human feeling and its enactment in living systems is still likely incomplete. The timeline of adequate 
solutions may be out of pace with the urgency of AI alignment. Second, as with any procedural solution 
in complex systems, unanticipated dilemmas may arise from its implementation that are currently 
insurmountable, however pressing the problem. We must at least consider the necessity of unforeseen 
alternative solutions, and of failure. Containment may wind up being more feasible than engendering 
spontaneous ethical behavior. Third, should we be able to create feeling, harm averse AI, it may 
necessitate ethical responsibilities towards these novel, artificial life forms that are at odds with the 
perilous roles we may necessarily allocate to AI. Last, as AI approaches the complexities of human 
consciousness, optimally compassionate solutions may appear deeply troubling or unfeasible to human 
eyes, and hence be rejected. Even a compassionate AI, invested in its own survival, may still opt towards 
the harmful solutions we are trying to avoid, out of perceived necessity. 
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IV. Conclusions, outstanding questions and future directions 


Our central proposal is that genuinely caring behavior and decision-making is not likely 
achievable via a rule-based, top-down approach, and will likely not emerge unless AI’s have some way to 
understand suffering and the contingencies of embodiment, in a bottom-up, experiential way. We argue 
that a rule-based approach to creating fully empathic robots will ultimately fail for several reasons. The 
first challenge to a top-down approach is that there exists no universally agreed upon set of moral rules in 
propositional form. The articulation of a common set of principles that should guide moral behavior is a 
problem without current resolution in moral philosophy. Furthermore, a rule-based approach may be 
unable to dynamically respond to novel ethical dilemmas without a never-ending branching of 
context-specific exceptions and qualifications. 


A recent review on alignment in AI concluded that “When it comes to ethical decision-making in AI 
systems, the AI research community largely agrees that generalized frame-works are preferred over 
ad-hoc rules.”’(Taylor et al., 2020). The bottom-up approach we outline here avoids these pitfalls by 
driving AI decision making through a universal principle from which homeostasis, feeling, harm aversion 
and morality emerge: the drive to preserve physical integrity. An empathic AI must do more than simply 
decode the internal states of agents nearby; it must plan and behave as if harm and benefit to others is 
occurring to itself (to an extent). Doing so requires affective, experiential empathy, necessitating 
embodiment (Atran et al., 2014, Christov-Moore et al., 2016, Decety and Cowell, 2017), even if this is 
temporary. Otherwise, humans may simply produce an AI that primarily leverages its empathic 
capabilities to nurture its own feelings, decoding human feelings, and acting “appropriately”. Such an AI 
could effectively be considered sociopathic. 


In addition to creating safer AI, having a model of agent integrity in the environment beforehand should 
lead to faster training times. It has been shown that mapping to pre-learned representations substantially 
improves performance (Gaddy, David, & Klein, 2019). Since understanding agent integrity requires an 
understanding of its environment, a mapping of the environment from the empathy training phase could 
be used to speed up training in subsequent phases. This would improve on the completely random 
initialization many reinforcement learning models use to begin their training from, which are known to 
converge slowly when rewards are sparse (Driessens, Kurt, & DZeroski, 2004). 


Our proposed approach addresses crucial problems in AI alignment but faces serious potential obstacles. 
Even a compassionate AI invested in its own survival, might still opt towards the harmful solutions we 
are trying to avoid, out of perceived necessity. Containment may be more feasible than engendering 
spontaneous ethical behavior. Approximating human-like feeling may require a level of engineering 
complexity that is not currently feasible. Our understanding of human feeling and its enactment in living 
systems is still incomplete and out of pace with the urgency of AI alignment. Should we succeed in 
creating feeling AI, we may find ourselves with inescapable ethical responsibilities towards them. These 
may be incompatible with the often perilous roles we will need them to fulfill. 


The design of optimally prosocial solutions to complex, ethically fraught problems is an additional issue. 
A feeling AI may experience the equivalent of paralyzing personal distress in the face of short-term harm, 
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lacking the complexity to understand the long term, positive outcomes of a decision that seems harmful at 
first approximation. Many approaches have been proposed to overcome this issue, including a focus on 
advancement in specifying goals, adjusting incentives to optimize them, and human oversight (Taylor et 
al., 2020). Multiple avenues of research are underway to address the alignment of AI actions with human 
concerns (Amodio et al., 2016, Taylor et al., 2020, Yu et al., 2018). However, these approaches still 
acknowledge the need for a stage in generalized AI development that integrates a global value related to 
human flourishing which can mitigate drastic or harmful solutions (Taylor et al., 2020, Yu et al., 2018). 


The scaleable cognitive complexity of AI may allow us to surpass the idiosyncrasies and limits of human 
empathy. However, this more advanced, complex compassion may produce solutions to extraordinary 
problems such as climate change, resource distribution and conflict mediation, etc. that most humans 
reject as unfeasible or unacceptable. How do we trust an intelligence so far beyond our own? Can an AI, 
which can convincingly evince empathy in its decisions and not just in its appearance, better establish 
trust with human agents and society at large? Given that universal compassion may not always be 
optimal, how does AI bound the limits of the systems it is addressing? These and other questions remain. 


Can we prevent the development of robot sociopaths? Could a contemporary Buddha be artificial? 
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